AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
نویسندگان
چکیده
Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR10, CIFAR-100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while changing accuracy by less than 1% relative to training with fixed batch sizes.
منابع مشابه
Cystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملAn adaptive estimation method to predict thermal comfort indices man using car classification neural deep belief
Human thermal comfort and discomfort of many experimental and theoretical indices are calculated using the input data the indicator of climatic elements are such as wind speed, temperature, humidity, solar radiation, etc. The daily data of temperature، wind speed، relative humidity، and cloudiness between the years 1382-1392 were used. In the First step، Tmrt parameter was calculated in the Ray...
متن کاملLayer Normalization
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to tha...
متن کاملL1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training process by a large margin, which aggravates the training effort. Furthermore, the nonlinear square and root operations in BN also impede the low bit-width qu...
متن کاملNormalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training
In this paper, we propose a generic and simple algorithmic framework for first order optimization. The framework essentially contains two consecutive steps in each iteration: 1) computing and normalizing the mini-batch stochastic gradient; 2) selecting adaptive step size to update the decision variable (parameter) towards the negative of the normalized gradient. We show that the proposed approa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.02029 شماره
صفحات -
تاریخ انتشار 2017